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Abstract 

We present an online approach to portfoho selection. The motivation is within the context of 
algorithmic trading, which demands fast and recursive updates of portfolio allocations, as new data 
arrives. In particular, we look at two online algorithms: Robust-Exponentially Weighted Least Squares 

^^ ' (R-EWRLS) and a regularized Online minimum Variance algorithm (O-VAR). Our methods use simple 

ideas from signal processing and statistics, which are sometimes overlooked in the empirical financial 
i^^' literature. The two approaches are evaluated against benchmark allocation techniques using 4 real 

datasets. Our methods outperform the benchmark allocation techniques in these datasets, in terms of 
both computational demand and financial performance. 

[■~~. ' Keywords: Portfolio Selection, Mean- Variance Portfolios, Adaptive Filtering, Robust, Online, Invest- 

ment Management. 

1 Introduction 

Ph 

C^ ' In portfolio allocation problems, investors aim to optimize the return of the invested capital based on some 

^^ . cost function by allocating a fraction of the capital in a number of different assets. In the long established 

Q^l mean-variance theory (see Markowitz (1952)) for asset allocation, the fraction of the capital invested in each 

asset is known as the portfolio weight, and all weights together form a linear combination (portfolio) that is 
optimal when the expected return of the portfolio is maximized for a fixed level of variance of the portfolio. 
1^ , The approach argues that maximization of expected returns does not guarantee that the portfolio will have 

r — ' the smallest variance. Hence, a trade-off between the expected return and the variance of the portfolio 

0^ ' provides a more effective diversification of investors funds. Investors are considered risk averse and would 

prefer the portfolio with the smallest risk when expected returns are equal. Moreover, a portfolio with 
•Q \ smaller variance is a desirable attribute, as investors could leverage by increasing the capital allocation, 

f^ ' so that the portfolio would achieve higher return on capital. Although the mean-variance analysis theory, 

initially, generated little interest, it is now a mainstream theory whose principles are constantly visited and 
re-invented. We also wish to clarify that the meaning of the terms assets and instruments are used in this 
k> I text interchangeably and they are deemed as available investment vehicles. 

\^ • However, in mean-variance optimization it is well known that the portfolio weights can be highly 



C^ 



unstable. This is due to the difficulty of estimating expected returns; see Merton (1980). As a result, there 
has been a substantial amount of recent interest in improving estimation procedures, including: Baltutis 
(2009); DeMiguel & Nogales (2009); DeMiguel et al. (2009a,b); Fabozzi et al. (2007); Fabozzi et al. (2009); 
Jagannathan & Ma (2003); Ledoit & Wolf (2003, 2004). This work ranges from imposing constraints on 
the optimization function, to robust portfolio estimation procedures. Whilst these publications are vital in 
the understanding of portfolio allocation, they are mainly concerned with batch procedures -that requires 
historical observations- as opposed to online techniques which are equipped with recursive estimation 
mechanisms. Batch procedures are not necessarily designed to be computationally efficient and address 
the streaming nature of financial data, nor to handle the high dimensionality of the available assets for 
allocation. 

We approach the asset allocation problem from the algorithmic trading perspective, that is when in- 
vestment decisions regarding allocations are taken automatically through investment allocation algorithms. 
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as soon as data arrive. Algorithmic trading, otherwise known as automated or systematic trading, refers 
to the use of algorithms to conduct trading without any human intervention. As an example, in 2006, 
one-fifth of global equity trading was administered through algorithmic techniques (Keehner (2007)). Such 
transactions are executed within a few milliseconds and any latency can make a difference between a prof- 
itable or loss making trade. In this context batch algorithms are unsuitable and we must consider online 
procedures. One such consideration is implementation of algorithms relating to portfolio optimization. 

In this study we use ideas from mean-variance theory to automate the process regarding portfolio 
optimisation. In particular, we make use of the algebraic link of the classic mean-variance theory to 
Ordinary Least Squares (OLS) to allocate capital among various assets. We construct these algorithms 
bearing in mind certain considerations with regard to efficiency of trading and characteristics of financial 
data. These algorithms may account for one or more of the following attributes: 

• Adaptive: They have the ability to adapt to non-stationary market environments. By dynamically 
incorporating new information into the portfolio weights, one is likely to improve the financial per- 
formance of the resulting algorithms. 

• Robust: They are able to counter the adverse effect of outliers in estimation. 



• 



• 



Regularized: They have mechanisms to reduce the high level of noise exhibited in financial data, 
either though direct regularization or dimensionality reduction techniques. 

Efficient: They are sequential, one-pass methods to suit to the nature of the problem, that is to 
process information fast in order to exploit investment opportunities as they occur. 

The above considerations, together with ideas of asset allocation using regression, enables us to devise 
suitable techniques for algorithmic trading. We will use these techniques on real datasets and compare 
them against established and well documented asset allocation methods. 

1.1 Contribution and Structure 

Online or multi-period portfolio optimization has been investigated in the literature, a non-exhaustive list 
includes: Agarwal et al. (2006); Chapados (2007); Frauendorfer & Seide (2000); Helmbold et al. (1998); 
Kuhn et al. (2009); Li & Ng (2000); Smith (1967). Montana et al. (2008, 2009) investigated onhne algo- 
rithms for statistical arbitrage trading strategies. Some of the (computationally fast) portfolio optimization 
techniques originate in computer science/machine learning and they are algorithmically distinct from the 
standard mean-variance type procedures that are often found in the empirical finance literature (as exem- 
plified in the list above) . As such, one of the main objectives of this article is to bridge efficient algorithmic 
techniques found in various disciplines with long established portfolio selection literature in finance. The 
online algorithms are developed here for three reasons. 

• For their importance from an applied perspective. 

• To cross fertilize financial ideas, with ideas from signal processing, statistics, computer science and 
lead to more efficient techniques. 

• To illustrate the potential improvements in financial performance. 

There are a substantial number of ideas in the listed financial literature which can improve the current 
allocation techniques. However, they appear to be seldom used in empirical finance and our objective 
is to provide a simple exposure to these ideas. For example, the constraints typically used in mean- 
variance problems (e.g. DeMiguel et al. (2009)) correspond to standard Tikhonov regularization and are 



well-understood in signal processing as helping to guard against instability induced by ill-conditioned 
matrices. Ill-conditioned matrices are often encountered in mean-variance theory because of the multi- 
coUinearity of asset log-returns, which may lead to rank deficient problems (see Hansen (1996)). Moreover, 
as mentioned earlier, adaptive algorithms have the ability to adapt their estimates to the underlying data 
and they are naturally more suitable for non-stationary environments, such as those in finance. In the 
sequel, we construct two algorithms which are related to batch mean-variance and minimum variance 
methodology. These two methods use simple ideas from signal processing and statistics to construct fast 
and robust approaches to portfolio selection. 

This paper is structured as follows. In Section [2] the mean- variance theory and our online framework 
is introduced. In Section [3] the computation for our two methods is developed. In Section 2] our methods 
are applied to 4 real datasets and, finally, in Section [5] we conclude the paper discussing possible avenues 
of future work. 

1.2 Notation and Set-Up 

In this paper, the following notations are adopted. All vectors are column vectors, and we denote the 
transpose by the prime symbol i.e. (/3')' = /3. The column vector of d ones is written I^ and the d x d 
identity matrix is written Idxd- Given a collection of d— vectors Xj, . . . ,xt, ^ 1^ j < T say, the {T —j + l)xd 
matrix composed of the concatenation of these vectors is written Xj-t- Denote the Holder p— norm by 
\\l3\\p = (ELi \l3z\^y^^- The trace of a matrix A is written tr{A). 

2 Portfolio Selection 

In the following section we introduce the problem and describe our framework. The log-returns of d financial 
instruments are observed at times 1, 2, . . . , T: xi, . . . , xt, Xn — (a;i,„, . . . , Xd,n)' for n £ {1, ... , T}. An 
investor seeks to construct a portfolio by optimally (in some sense) allocating funds to a collection of d 
instruments. 

2.1 Batch Portfolio Selection 

Most portfolio selection problems are stated in a static or batch manner. For completeness we describe the 
mean-variance theory (Markowitz (1952)). Denote the mean and covariance matrices of the log-returns as 
// and S respectively. Then the objective is to solve the problem 

max|;3V- -/S'S/Si sX.(3'ld = \, 

where j3 is the d-vcctor of portfolio weights. This optimization problem is straight-forwardly solved via 
Lagrange multipliers. In practical situations, the estimated mean and covariance is substituted into the op- 
timization problem, leading to a data-dependent solution. Intrinsically, many of the portfolio optimization 
problems that are considered in the literature may be written as 

max |/(Xi:t; /?) + Tl[l3'ld - 1]| or mm J/(Xi:t; /3) + ii^ld - 1] 

for some function /, Lagrange multiplier 77 and matrix of log- returns Xi:t- For example, one of the problems 
in DeMiguel et al. (2009b): 

min(/3'E/3 + J||/31|i+r;[/3'Id-l] 



corresponds to a minimuni variance portfolio with Li— constraints, where that "hat" notation refers to an 
estimated quantity. We note that this approach involves constructing a covariance matrix and subsequently 
computing its inverse to arrive to a solution. As mentioned earlier, d can be very large and this often leads 
to computational delays. These computational delays can be detrimental in algorithmic trading, where 
tick data are streaming and decisions about allocation need to be taken instantly based on the latest 
information. 

Another known portfolio allocation technique, which is used throughout this article, is the naive strategy 
which assigns equal constant portfolio weights to all instruments in the portfolio (i.e. I^l/ii). This simple 
allocation technique is of practical importance, as it has been shown in an empirical study by DeMiguel et 
al. (2009b) to outperform many more complicated allocation techniques. 

2.2 Online Portfolio Selection 

The simple extension that is studied in this paper, is to consider: 

min I /„ (X„„ :„;/?„)+??« Kid -1] I a„ = 1 V (n - M^ + 1) nG {!,..., T} (2.1) 

where 7y„ is a Lagrange multiplier and W is a fixed window of data. That is, the parameters are now 
estimated over a sliding window W, rather using all available data. Note that when W = I, then a„ = n 
i.e. X„:„ which is the vector x„. This is chosen to ensure that our algorithms are of approximately fixed 
computational complexity per time-step (see Section [4.41 for discussion on window length selection). Note, 
the larger the sliding window, the more data are used for estimation. Conversely, the smaller the sliding 
window, the more weight is given to more recent data. (j2.ip includes some interesting special cases such 
as: 

/n(^a„:„;/3„) = || V " ^a,.:„/3„ || 2 + -^r. ||/3„ |! 2 (2-2) 

which could be considered a sequential ridge-regression, for S being the regularization parameter. This latter 
formulation is equivalent to a mean- variance problem (see Section |2.2|) with L2— constraints; see Britten- 
Jones (1999) for details. Note also, that the function in Helmbold et al. (1998) {F in their notation) also 
falls into the framework above. 

The reason for giving (|2.2p is to provide a link between mean- variance theory and recursive estimation 
algorithms. As such, we are able to devise recursive asset allocation algorithms, through the use of recursive 
least squares, for dealing with streaming data and take advantage of the number of regularisation methods 
developed for regression to deal with the inherent instability of the portfolio solution to estimation error. 

2.2.1 Objective Functions 

The first case we propose is 

n 

/n(X„„:„;/3„) = ^ A"- V(r, (/?„)) (2.3) 

with W equal to the size of all available observations, p : M — ^ M+ diffcrentiable and ri(/3„) = X^i^i [(1 ~ 
Xj,il3j^n)/<^j.i\ ■ The parameter cr is a scale parameter estimate that is used to standardize the residual error 
(l — Xj^i/Sj^n)', we use a robust scale parameter defined later in Section [5. 1.1 1 The parameter A is a forgetting 
factor; this is a well-known tool in adaptive filtering e.g. Haykin (1996). The choice of 1 in ri(/3„) follows 
the work in Britten- Jones (1999). A heuristic explanation is as follows: setting the response variable equal 
to a positive constant implies that our portfolio is minimised against an ideal portfolio that has positive 
returns for each timestep and is risk- less (a vector a constant has zero variance). 

The objective function (|2.3[) corresponds to a sequential form of M-estimation (see e.g. Deng (2008) for 
related ideas). (|2.3p follows the recent trend in portfolio optimization to use robust statistical procedures 



to estimate parameters of interest; see e.g. DeMiguel & Nogales (2009). For reasons that will become 
apparent, the approach associated to (12.31) is termed robust-exponentially weighted recursive least squares 
(R-EWRLS). 

The second case is: 



J7i\^an'-ni Pn) n 



P'n{K„:nFx„X^^.,,,)P„+6JPjl 



(2.4) 



where F\ = diag(A'*', A'^"^, . . . , 1). The task of estimating A and S parameters is discussed later in Section 
13.2.11 This corresponds to an online minimum- variance- type algorithm with L2— constraints (termed online 
minimum- variance (0-VAR) throughout). The matrix Fx introduces a forgetting- factor into the optimiza- 
tion scheme. The use of the estimated second moment, instead of the covariance is for computational 
reasons; we did not find a substantial discrepancy (in terms of financial performance) when compared to 
using the covariance matrix. Note that a more standard recursive estimate could be obtained using the 
function 



fn{Xi;n', Pn) 



(3'^{x'„Xn)l3n+Sn\\(3n-l3n^l\\l 



but is not considered here, due to the relationship of (|2.4p to the standard minimum-variance approach. 

The batch version of (|2.4I) is studied in DeMiguel et al. (2009b). The L2— constraints correspond to an 
L2 distance with the naive allocation strategy. The naive approach to allocation surpasses estimation of 
the sample mean and one would expect relatively stable portfolio weights. 

Note that, for both procedures there are unknown parameters A, S and a. The next section discusses 
how these parameters may be set, in addition to recursive formulation of the proposed optimizations. 



3 Updating Schemes 

In this section, we introduce our recursive updating approaches. This section is core to the development of 
the adaptive allocation algorithms as it formulates efficient regression techniques appropriate to the nature 
of algorithmic trading. 



3.1 R-EWRLS 

Let us introduce some notations: 



J = l 



'■J,n 



I dp 

q{x) = --^(a;)- 
X ax 



Then, ignoring the Lagrange multiplier (the result can be renormalized), we are to minimize (12. 3p . Differ- 
entiating, it follows that the optimal /3„ solves 



E A"-*<z(r,(/3„))a.£, = J^ A"-^g(r»(/3„)) ^ ^/3, 



i=l 



J = l ■'' 



Since this equation is often non-linear, we use the approximation r„(/3„) = ?'n(/5n-i), with /3„_i given 
(i.e. by the previous step, or by initialization). Now, let z„ denote the L.H.S. and $„ = E"=i ^"^^Q{1^^Wn))xix'^, 
then we are to solve 

^n ^nPn- 

As $„ = A$„_i + g(r„(/3))a;„x^, and writing P„ = $^^, it follows via the Sherman-Morrison (e.g. Haykin 
(1996)) formula 

_r,j ^ A -r,i— 1 — A K„a:„/>i_i 



with ^ 

g(r„(/3„_i))A"ip„_iX„ 



1 + qirnWn-l))x'nPn-lXn 

Using Zn ~ Az„_i + q{rn{Pn))<^nXn wc thus have the recursion 

We have presented a recursive least squares procedure whose algebraic equivalence with the Kalman 
filter is well-known and understood (see Chapter 12 of Sayed (2003)). It should be remarked that related 
ideas have appeared in Cipra & Romera (1991) and our approach is similar to robust filters (Martin (1979); 
Masreliez (1975); Schick & Mitter (1994)). 

3.1.1 Robust Recursive Scale Estimate 

The calculation of the scale parameter is now detailed. Our approach uses robust statistics. First, we note 
that the Median Absolute Deviation (MAD) (e.g. Huber (2004)) estimate of scale is given by 

MADv{Xn-v+i:n) = mcd■j{\x^.J -mcd,(xi,OI) j,l e {IV n - V + 1, . . . ,n},i £ {1, . . . ,d} 

where T^ is a chosen data window and med(-) is the median function. Recent research has pointed to 
efficient techniques to compute the median with 0{V) average complexity using recursive binning schemes 
(see Tibshhani (2008)). 

Second, an exponentially recursive median absolute deviation (EWMAD) estimator is considered 

^iT"^^ = w,,„_i + c(l - i^)mcdj(|a;jj - A^!""''^!) j G {IV n - V + I, . . . ,n} 

and where i/ is another forgetting factor and /2i_„ is an EWMED (Exponentially Weighted Recursive 
Median), given by 

JJ^''^=i^P^ + il-i^)med,ix,^,) je{lVn-V + l,...,n} 

where c — l/$^^(3/4) ~ 1/0.6745 is a correction factor to make MAD consistent with the normal distribu- 
tion (e.g. Huber (2004)). The EWMED is similar to the well documented EWMA (e.g. Hamilton (1994)) 
with the only difference that the EWMED estimator replaces the latest information x„ by its median 
estimate over the sliding window. On the basis of much preliminary investigation on specific datasets, we 
have arbitrarily set ^ = 20 and i' — 0.99 for all of the applications. Due to the robust nature of the above 
estimation, this method is termed robust-exponentially weighted recursive least squares. 

3.1.2 Dealing with Noisy Data 

As discussed earlier, financial data are inherently noisy and exhibit high degree of dependence. The noise 
hampers the ability to accurately forecast and the dependence structure of assets accentuates the problem, 
as pointed out in the introduction; this is via the instability of portfolio weights caused by potential rank 
deficiency. To alleviate for these problems we adopt a low rank matrix approximation of Xa^-n, VK < cxd in 
order to eliminate those components of data that contain most of the noise. This approach aims to optimally 
approximate, with respect to some norm, a matrix of lower rank while retaining the same same dimension. 
It is well known that the best low rank approximation can be found by Singular Value Decomposition 
(SVD) under the Frobenius norm (see e.g. Stewart (1993)). The approach is as follows. 



Let 1 < r < n A d be given and denote the singular value decomposition (SVD) of the returns matrix 
i„:n = Un^nVn- Consider the truncated SVD (see Hansen (1987)) 

/ ll.n ■ • ■ \ 

K„ 



r™ - p I K, 



■. ••• 

\ ■ ■ ^r,n / 



then set X^^.n = Un^^nV^. We replace x^ in the recursions in Section l3.ll with the final row of Xa„:n- 
The value of r is set during training. Note that the SVD of Xa„:n can be updated incrementally using the 
methods in (Bunch & Nielsen (1978)). 

3.2 Online Minimum- Variance 

The minimum-variance scheme is somewhat less involved. Suppose A, i5„ and W is given. It is straight- 
forward to show that, at time n, the solution of the optimization problem (|2.ip . with /„ as in (|2.4I) is 

Pn = -^ ^—^. (3.5) 

The main objective here is to calculate this quantity quickly. Suppose we are given the eigen-decomposition 
of X'^ .^FxXa„:n, i-G. X'^ :„i^A-'^Q„:n = Qn^nQ'n, then the invcrsc in (13. 5p is equal to 



1 

I/O' 



{X'^^.^F\Xa^;n) + Snldxd 1 = Qni^n + Snldxd) Qn 

that is, one need only calculate the inverse of a diagonal matrix. The recursive calculation of the eigen- 
decomposition can be achieved by the methods of Yu (1991) in 0{2(P); i.e. this operation is 0{(P) instead 
of the standard 0{(P~^^) {6 > 0) for matrix inversion. More specifically, the method of Yu (1991) is to 
re-calculate the new eigen-decomposition of R' , from R to R' of the form 

R' = R + Ci^; - 6C2 
with ^1 , ^2 vectors of the appropriate dimension. In our case we have that 

so the same ideas may be applied. Note that the incremental SVD mentioned above could also be used. 

3.2.1 Adaptive Calculation of (5„ and A 

There are still 2 free parameters to be set; Sn and A. 

First, consider 6n- Lacking an analytical solution, we investigate 6 numerically based on an initial train- 
ing data period. To investigate the effect of 6 perturbations to portfolio returns, we choose a short initial 
training sequence of data to calculate trace(X4 ■nPxXan-n) for a given A. Then, we select a collection of 
G equally spaced points between tr(X^ .^F\Xa^:n)/d and iv{X'^ .^F\Xa^;n). The algorithm is initialized 
at any of those points. At re-balancing times (the times when the allocation is altered) we compute the 
portfolio returns over the training period for each of the G points and select the one that generates the 
largest portfolio return. The range of the grid is based upon the recommendations in Ledoit & Wolf (2004) . 
We found our results to be extremely robust to the initial value of i5. 



Second, consider A. In this scenario, we only recalculate A at re-balancing times, which incurs the 
cost of re-computing the eigen-decomposition of X'^ .^F\Xa^:n- We follow a similar procedure to that in 
adaptive filtering. An attractive criterion for portfolio selection, is to minimize 

\\Iw — Xa„:nf3n\\2 

see Britten- Jones (1999). As a result, at the m*''— re-balancing time, the following stochastic approximation 
type update is used: 



-.ml ( r) [ 11 

^ + —1 E HaA \\^w-F,X^^.,(3,\\l 



Am — A,_ ^ 

777,/ 

] = (m.~l)l+l 

See e.g. Chapter 14, Haykin (1996) for similar self-tuning approaches for recursive filtering. Note that if 
A,„ ^ (0, 1), then we set A^ = A,„_i. 

3.3 Discussion 

The two methods described here have some complementary aspects. Firstly, from the perspective of dealing 
with noisy data, the methods use separate, but well known procedures. R-EWRLS uses the truncated 
SVD, whilst the 0-VAR uses a form of Tikhonov regularization via L2— constraint. Secondly, the R- 
EWRLS method accounts for outliers by down-weighting them through a by-product weighting quantity 
{q, see Section 13.11) of the robust cost function. On the other hand, 0-VAR does not have an embedded 
mechanism to account for outliers as they occur. 

Thirdly, the 0-VAR is adaptive to non-stationary environments and accounts for variability in the 
underlying environment through the self-tuning forgetting factor A. However, the rank S needs to be set 
during training. In R-EWRLS case, A needs to be calibrated in advance and such calibration needs to take 
place every time a shift occurred in the underlying environment. Also, rank r of the low rank approximation 
(Section 13. 1.2[) needs to be set in advance. 

It is likely that one procedure is likely to be preferred given the scenario. For example, when the data 
are subject to a change in the economic cycle, one would expect the 0-VAR to perform significantly better, 
however, 0-VAR it does not take into consideration expected returns. In that respect, 0-VAR may be 
more suitable for assets that are expected to grow in the future. For instance, it may be suitable for fund of 
funds whose underlying investments have positive expectation and desire to allocate robustly. Alternatively, 
it could be suitable for an algorithmic trading system that allocates between allocation strategies in an 
adaptive and efficient way. Finally, the R-EWRLS is linked to mean- variance theory and should be suitable 
for any asset class and as a standalone allocation strategy. Note that 0-VAR is similar to a more efficient 
version of the function in DeMiguel et al. (2009b) . 

4 Application 

The techniques described in Section [3] are applied to 4 datasets. Financial performance is compared to 
standard methods. Note that a zero-rate risk free interest rate is assumed throughout. 

4.1 Data Description 

We perform our analysis on 4 datasets; spot Foreign Exchange (FX), constituents of DJ Euro Stoxx, 
portfolios of NYSE, NASDAQ, AMEX and constituents of FTSE-100 (see Figure [T]). 

Our first dataset consists of 19 spot currencies quoted against the American dollar. For ease of in- 
terpretation, we use the convention "USD/. . . ", where USD is always the base rate and is read "units of 
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Figure 1: Price Data. Note that the DJ Stoxx prices have been scaled by 10. 

foreign currency per 1 USD". The dataset covers a period of approximately 5^ years of daily data, from 
01/10/2002 until 12/03/2008. The spot data have been obtained from the "FXHistory" functionality of 
OANDA (www.0ELnda.com). 

The second consists of 43 constituents of DJ Euro Stoxx 50, of approximately 5 years of closing prices, 
from 21/10/2002 until 13/09/2007. The data have been obtained from Yahoo (http://uk. finance .yahoo, com/) 
and have been adjusted for discontinuities related to financial events, such as stock splits and bonus issues. 

The third dataset are the daily returns on 25 portfolios formed on size and book-to-market from 
NYSE, NASDAQ and AMEX. The data are from 01/07/63-31/12/08. The data were obtained from 
http : //mba . tuck . dartmouth . edu/pages/f acuity/ken . f rench/ data_library . html. 

Our final dataset are 6 constituents (BA, Barclays, Lloyds TSB, M & S, RBS, Tescos) of the FTSE- 
100 share index. The daily data are the adjusted closing prices taken from 17/07/04-17/07/09 and also 
obtained from Yahoo. These particular data will be of interest, to observe the performance of relatively 
simple allocation schemes, during 2 financial crises: the selloff in 2006 caused by algorithmic trading and 
the sub-prime mortgage crisis in 2008. It should be noted that some of our data are clearly subject to 
survivorship bias; one should take this into account when looking at the performance measures. 



4.2 The Allocation Strategies 

In our comparison, in addition to the methods developed in Section [3J we consider 3 standard batch 
strategies: 

• NAIVE. This encompasses allocating funds in equal amount to each asset. As noted in DeMiguel et 
al. (2009a), this strategy provides an important benchmark despite its simplicity. 
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• Mean- Variance (M-VAR) . This is the standard Mean- Variance methodology. To remove any numeri- 
cal difficulties with inversion, as noted in introduction, the covariance matrix is replaced by 'S + 'dlryxd, 
where "d is some non- negative constant. The regularisation parameter is chosen as "(^ = trace(E), sim- 
ilar to Ledoit & Wolf (2004). 

• Minimum- Variance (VAR). Standard minimum- variance methodology with the covariance replaced 
as for M-VAR. 

For R-EWRLS, p{x) — log{cosh(a;)}. If one could interpret the procedure as a regression, this would 
imply a hyperbolic secant error distribution (see Benesty & Gansler (2001)). We experimented with more 
standard choices of p (e.g. Huber's loss function, see Hubcr (2004)) but did not find that this significantly 
affected our conclusions. Note that we implemented the method of Hclmbold et al. (1998), but did not 
find a significant difference with the NAIVE strategy. 

4.3 Comparison Criteria 

In order to compare and investigate our strategies, we consider various criteria. The basic idea is to 
initialize all of the strategies in some way; the first 2 years (504 data points) of each dataset are used for 
training (i.e. omitted afterwards). In particular, and helping to avoid look-ahead bias, the M-VAR and 
VAR strategies use the first 2 years of data to estimate the portfolio weights and these are used until the 
first re-balancing instant. The re-balancing instant is then determined going forward by the re-balancing 
window W i.e. re-balancing every 250 data points. Then the data in the time up-to the last re-balancing 
period is used to re-estimate the weights. The weights are initialized as 1 and are employed from day 
2 on-wards. Note that the actual weights used to compute portfolio returns, they are only based upon 
those calculated at re-balancing times. That is to say, we update the weights for the online methods, 
but only employ new weights at re-balancing times. As such, trading is infrequent and the transaction 
cost associated with these allocation strategies is negligible. Therefore, we refrain from using transaction 
costs, as this would have introduced another layer of assumptions since transaction costs often differentiate 
substantially from firm to firm given their "bargaining" power to negotiate down trading commissions. 

The criteria employed are standard in financial applications. The returns for each day are calculated 
and we consider: annualized returns and volatility, Sharpe ratio, % average daily gain and loss, % of 
winning trades (WT), maximum draw-down (MDD) and turnover (TO). Of these, perhaps the last 2 need 
a little explanation. The maximum draw-down is equal to 

-minlwi.ui + V2,. ■ . ,Vi-\ \- vt,V2:V2 + v^, . . . ,V2 ~\ h wt,- ■ • ,vt} 

where Vi is the percentage return at period t. In words it constitutes the maximum movement from peak 
to trough of the cumulative returns, in percentage terms. The turnover is a measurement of the frequency 
of trading. It is the average of the absolute difference of the portfolio weights between re-balancing times. 

4.4 Initialization 

We now discuss the selection of parameters for the R-EWRLS approach. We explore the Sharpe ratio for 
the spot FX and DJ Euro Stoxx 50 datasets over a grid of equally-spaced values of the parameter r and the 
forgetting factor A. The results of the exploratory analysis are depicted by means of contour plots (Figure 

For the R-EWRLS allocation strategy using the equities data, we note in Figure [2] that the Sharpe 
ratio is positive throughout the parameter space. There is an evident pattern that lower values of r exhibit 
higher Sharpe ratio and the difference becomes more pronounced for higher values of A. For the FX dataset, 
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Figure 2: Sharpe ratio to pertubation of the parameters of the R-EWRLS method for DJ Stoxx (left panel) 
and spot FX (right panel). The parameter A is the forgetting factor and the parameter r is the rank of the 
low rank data approximation. We set W — 250. 

we note (Fig. ([2|)) that there are evident structures of higher Sharpe ratio regions in the parameter space 
suggesting dependence to the A and r parameters. In particular, the best performance is achieved for values 
of A approximately between 0.94 and 0.98, when r is greater than 4. On the basis of such plots, we select 
the values of r and A. 

The choice of W is important for both of our methods and to an extent, conflicting with A. That is 
to say, instead of making A large, W can be made smaller and vice-versa. However, the choice of W is 
also a computational issue; we may only want to attribute a set memory to the storage associated to the 
data. This is the line we follow and set W = 250 (approximately 1 year of trading) which is not too large 
for computational purposes and does not interfere substantially with the data memory profile implied by 
A, for the purposes of portfolio selection. This is to say that the exponential decay profile would only be 
truncated for W greater than 250. Then the role of A is far clearer with respect to the forgetting of the 
data. 



4.5 Numerical Results 

The algorithms were run with re-balancing performed every 50, 150 and 250 days. On the basis of training, 
the R-EWRLS used r = 5 for the first 2 datasets, r — 23 for the third and r = 2 for the fourth; respectively 
A e {0.8,0.8,0.8,0.75}. For 0-VAR, G = 100 (see Section [SXI]) and the initial A = 0.05. Note that in 
each instance, the forgetting factor converged close to 1 (implying very little forgetting), when there were 
sufficient re-balancing periods. 

We conducted a computational speed comparison between the batch mean-variance optimization ap- 
proach against our methods. We coded the methodologies in Matlab (version 7.4). In a data matrix of 
1000 X 500 dimension, we found that an iteration needs approximately 15 milliseconds compared to 2 
seconds for the batch mean-variance computation. In a separate experiment, we increased the number of 
rows from 1000 to 5000. The batch approach computation time increased to 6 seconds. The results can be 
found in Tables [T][3l Some of the annualized volatilities of the strategies exhibited on the tables could be 
rather high and unrealistic for an investor, but the results are clearly valid as we compare the Sharpe ratio 
which adjusts for volatility of the underlying strategy. However, one needs to be cautious when comparing 
maximum draw-down of allocation strategies, as this depends on the volatility of the underlying strategy. 
Let us consider each dataset in turn. 
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4.5.1 Spot FX Results 

Our first observation is that only the R-EWRLS method is consistently producing positive returns. Indeed, 
this is true with respect to different re-balancing periods. This partly suggests - inferring from Figure [2] as 
well- that for particularly noisy data, the truncated SVD has a beneficial outcome in the portfolio weights 
computation; this is in contract to M-VAR whose performance is sensitive to change in W and that result is 
in line with those reported in the literature. We also note that the portfolio weights of R-EWRLS are more 
"active" , as indicated by turnover. This could also imply the R-EWRLS adapts better in the underlying 
environment, given that delivers consistently better performance than M-VAR. For the 0-VAR, due to its 
similarity to the NAIVE strategy, it is unable to provide positive returns; the latter it exhibits particularly 
bad performance here. This is because the NAIVE allocation strategy implies only long positions and is 
expected to benefit from a long-term growth typically exhibited in equities, but not necessarily for FX spot 
prices. 

4.5.2 DJ Euro Stoxx 50 Results 

Moving to the second dataset a more familiar pattern (i.e. as is often reported in the literature) is displayed. 
The NAIVE and VAR strategies perform relatively well, with quite favourable Sharpe ratios, given the 
simplicity of the strategies. The 0-VAR method performs marginally better than the VAR strategy, but 
with a noticeable increase in turnover. R-EWRLS also delivers satisfactory performance and outperforms 
M-VAR. 

4.5.3 Portfolio Data 

The portfolio data provide some very interesting results. In this case the 0-VAR provides the most 
impressive results from a financial perspective, but performance tends to decrease as the re-balancing time 
increases. The success of the 0-VAR method is linked to a wide variety of factors. Firstly, due to its 
similarity to NAIVE, this method is likely to fair very well; see Figure [1] and the remarks in Section 14.5.11 
Secondly, 0-VAR method should fair well because all parameters are adaptive to the data. However, we 
note that R-EWRLS is only trained on the first 2 years of data. Since the data are 45 years long, 2 years 
is clearly insufficient in which to train the algorithm. Although this is a little unfair (e.g. the parameters 
can be retrained every 5 years, as would be the case in practice), it highlights a small deficiency of the 
R-EWRLS method. Thirdly, against the VAR method, the the smoothness of the portfolio weights is 
regulated by the Tikhonov regularization. This may have beneficial outcome in the performance through 
better estimation (see introduction for rank deficiency discussion). 

4.5.4 FTSE-100 

The final data provide an interesting set of results. Due to a variety of economic, cultural (business- 
wise) and investor related factors, many quantitative equity hedge funds have performed poorly during the 
current financial crisis. As a result, it is of interest from an applied perspective to observe the results of 
our models in such a difficult trading period. Rather unsurprisingly, many of the strategies perform badly. 
However, in 2 instances, both of our online methods provide positive returns. This is encouraging, as to 
an extent it suggests that the ability to process data as it arrives and adapt our strategies accordingly is 
more useful in practice than standard batch methods. 

4.5.5 General Comments 

On the basis of our investigations, we make the following observations: 
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Data 


Method 


% gain 


% loss 


MDD 


% WT 


TO 


Ann.R. 


Ann.V. 


Sharpe 


1 


0-VAR 


0.17 


-0.17 


14.50 


48.08 


0.59 


-1.34 


4.08 


-0.33 




VAR 


0.20 


-0.22 


18.09 


49.70 


0.074 


-1.94 


4.58 


-0.43 




R-EWRLS 


0.49 


-0.51 


8.59 


53.10 


19.51 


6.22 


10.90 


0.57 




M-VAR 


1.52 


-1.65 


221.28 


48.96 


13.98 


-23.22 


44.93 


-0.52 




NAIVE 


0.22 


-0.23 


17.89 


49.43 


- 


-1.89 


4.91 


-0.38 


2 


0-VAR 


0.55 


-0.54 


9.93 


55.35 


0.88 


15.18 


11.35 


1.34 




VAR 


0.58 


-0.58 


10.78 


55.41 


0.08 


15.14 


12.40 


1.26 




R-EWRLS 


0.62 


-0.68 


12.91 


56.23 


168.26 


13.82 


13.63 


1.01 




M-VAR 


1.67 


-1.53 


49.44 


49.36 


10.23 


11.98 


34.41 


0.35 




NAIVE 


0.60 


-0.59 


11.10 


54.63 


- 


16.12 


12.72 


1.27 


3 


0-VAR 


0.38 


-0.38 


52.89 


58.81 


1.70 


16.16 


8.90 


1.82 




VAR 


0.56 


-0.63 


67.90 


56.94 


0.08 


12.11 


13.92 


0.87 




R-EWRLS 


4.70 


-4.00 


1281.34 


50.76 


73.96 


107.16 


428.92 


0.25 




M-VAR 


1.58 


-1.61 


240.56 


55.96 


12.40 


43.88 


42.01 


1.04 




NAIVE 


0.58 


-0.65 


67.86 


56.67 


- 


11.93 


14.30 


0.84 


4 


0-VAR 


1.73 


-1.73 


65.74 


49.43 


0.99 


-4.99 


41.19 


-0.12 




VAR 


1.72 


-1.18 


139.89 


50.19 


0.09 


-11.50 


43.66 


-0.26 




R-EWRLS 


9.27 


-8.53 


890.60 


47.91 


15.15 


-1.06 


342.45 


-0.003 




M-VAR 


2.37 


-2.38 


182.45 


47.78 


4.56 


-28.32 


55.01 


-0.51 




NAIVE 


1.84 


-1.95 


155.03 


50.32 


- 


-12.08 


47.51 


-0.25 



Table 1: Algorithm performance across datasets. The portfolios are re-balanced every 50 days. See Section 
SJfor details. 



1. The R-EWRLS method can be successful (positive returns) for noisy data. However, when the 
initial training period is insufficient/unreliable, very unstable results are obtained. In addition, high 
turnovers were observed for this method. 

2. The online and adaptive nature of the 0-VAR method, coupled with its link to the NAIVE strategy 
leads consistently to strong performance in comparison to the methods tested here. 

In terms of the first point, the R-EWRLS approach is related to M-VAR procedures, which can work well 
when there is detectable drift signal in the data. When combined with the robust scale computation and 
noise reduction a potentially superior method is derived. However, there are a number of free parameters, 
which are to be set. As a result, significant training is required and hence the success of the method is 
reliant on this latter procedure. 

The second point is clearly reflected in the Tables [T][31 The drawbacks of the R-EWRLS method are 
alleviated, but with the potential deficiency of being related to the NAIVE strategy, that is making a 
naive assumption for the direction of the market by having long only positions. This can lead to poor 
performance, e.g. for the FX spot data. 



5 Summary 

We have derived two efficient methods to compute portfolio weights online without the need of matrix in- 
version. We compared the two methods with existing techniques in portfolio optimization using 4 datasets. 
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Data 


Method 


% gain 


% loss 


MDD 


% WT 


TO 


Ann.R. 


Ann.V. 


Sharpe 


1 


0-VAR 


0.21 


-0.22 


19.21 


49.02 


0.28 


-2.13 


4.80 


-0.44 




VAR 


0.21 


-0.22 


18.96 


49.56 


0.07 


-2.11 


4.66 


-0.45 




R-EWRLS 


0.59 


-0.62 


21.40 


53.43 


30.45 


5.69 


13.62 


0.42 




M-VAR 


0.64 


-0.67 


24.67 


51.11 


5.55 


0.89 


14.87 


0.06 


2 


0-VAR 


0.59 


-0.57 


10.10 


54.84 


0.40 


16.22 


12.30 


1.31 




VAR 


0.58 


-0.58 


10.77 


55.28 


0.07 


15.92 


12.41 


1.28 




R-EWRLS 


0.75 


-0.83 


15.11 


56.42 


42.47 


17.03 


16.74 


1.02 




M-VAR 


1.02 


-1.01 


39.79 


52.84 


5.07 


15.33 


22.98 


0.67 


3 


0-VAR 


0.40 


-0.43 


67.96 


59.23 


2.40 


16.54 


10.00 


1.65 




VAR 


0.56 


-0.63 


67.84 


56.86 


0.07 


12.15 


13.97 


0.87 




R-EWRLS 


8.41 


-7.14 


2238.43 


50.84 


133.83 


193.41 


662.89 


0.29 




M-VAR 


1.06 


-1.11 


253.93 


56.69 


7.31 


31.04 


31.03 


0.97 


4 


0-VAR 


2.24 


-2.29 


97.52 


51.33 


1.70 


9.90 


62.05 


0.16 




VAR 


1.75 


-1.83 


140.23 


49.81 


0.07 


-11.25 


44.08 


-0.26 




R-EWRLS 


5.53 


-5.19 


357.94 


47.91 


13.11 


-13.37 


161.49 


-0.08 




M-VAR 


1.82 


-1.76 


89.99 


49.05 


2.21 


-0.01 


40.32 


-0.02 



Table 2: Algorithm performance across datasets. The portfolios are re-balanced every 150 days. See 
Section H31 for details. 



Data 


Method 


% gain 


% loss 


MDD 


% WT 


TO 


Ann.R. 


Ann.V. 


Sharpe 


1 


0-VAR 


0.21 


-0.22 


20.07 


49.49 


0.33 


-2.30 


4.66 


-0.49 




VAR 


0.21 


-0.22 


18.68 


49.63 


0.07 


-2.04 


4.71 


-0.43 




R-EWRLS 


0.45 


-0.48 


11.66 


53.67 


16.48 


5.60 


10.00 


0.56 




M-VAR 


0.52 


-0.53 


35.83 


49.43 


4.97 


-0.35 


12.11 


-0.29 


2 


0-VAR 


0.57 


-0.57 


9.97 


55.22 


0.71 


16.23 


12.08 


1.34 




VAR 


0.59 


-0.57 


10.70 


55.03 


0.06 


15.90 


12.42 


1.28 




R-EWRLS 


0.74 


-0.79 


14.46 


55.85 


26.79 


17.60 


16.08 


1.10 




M-VAR 


0.91 


-0.91 


22.92 


53.99 


3.57 


19.29 


19.51 


0.99 


3 


0-VAR 


0.44 


-0.46 


60.40 


58.21 


2.43 


15.87 


10.70 


1.48 




VAR 


0.57 


-0.64 


67.57 


56.91 


0.08 


12.11 


13.99 


0.87 




R-EWRLS 


1.98 


-1.93 


216.25 


52.04 


29.89 


25.45 


58.39 


0.44 




M-VAR 


0.86 


-0.93 


111.61 


55.99 


5.91 


18.37 


22.14 


0.83 


4 


0-VAR 


1.71 


-1.88 


155.03 


50.95 


1.21 


-12.00 


43.27 


-0.28 




VAR 


1.71 


-1.87 


146.86 


49.81 


0.09 


-12.11 


45.47 


-0.27 




R-EWRLS 


3.26 


-2.85 


151.37 


48.42 


3.77 


26.84 


72.74 


0.37 




M-VAR 


1.68 


-1.63 


77.15 


48.92 


1.61 


-3.60 


37.18 


-0.10 



Table 3: Algorithm performance across datasets. The portfolios are re-balanced every 250 days. See 
Section H31 for details. 
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We showed that our strategies predominantly outperform the benchmarks, when performance is measured 
by Sharpe ratio (note that this includes the method of Helmbold et al. (1998)). 

Future research can focus in extending our approach to include transaction costs (bid-ask spread and 
commission) as a function of the portfolio weight, as well as to consider adaptive re-balancing strategies 
(e.g. Baltutis, 2009). For example, the 0-VAR method does not explicitly incorporate previous weights in 
its estimate and, as such, can lead to high turnovers. In addition, future work could be focused upon mak- 
ing R-EWRLS fully adaptive. This requires the online selection of the number of singular values and lies 
on the interface of statistics, finance, signal processing and computer science. Finally, one of the drawbacks 
of the 0-VAR method was its relation to NAIVE allocation strategy. This could be removed, for example 
using Li— type constraints leading to an online lasso (Tibshirani (1996)) method (see e.g. Anagnostopoulos 
et al. (2008)). In this context, as the portfolio weights are required to sum to one (i.e. standard path- wise 
co-ordinate optimization (Friedman et al. (2007)) does not apply, we are left with an online quadratic 
programming problem. To our knowledge, with the exception of Zhang & Li (2009), there is little method- 
ology for this problem; we are currently working towards a solution. Our work also opens up interesting 
theoretical questions; e.g. to investigate the sensitivity of the portfolio weights (as in DeMiguel & Nogales 
(2009)) of online algorithms. 
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